pySCENIC protocol: PBMC10k downstream analyses

August 2019

Dataset: 10k PBMCs from a Healthy Donor available from 10x Genomics (here).

This notebook uses a loom file generated from the first part of the SCENIC protocol, described in: PBMC10k_SCENIC-protocol-CLI.ipynb

In [1]:
# import dependencies
import os
import numpy as np
import pandas as pd
import scanpy as sc
import loompy as lp
from MulticoreTSNE import MulticoreTSNE as TSNE
import json
import base64
import zlib
from pyscenic.plotting import plot_binarization
from pyscenic.export import add_scenic_metadata
from pyscenic.cli.utils import load_signatures
import matplotlib as mpl
import matplotlib.pyplot as plt
from scanpy.plotting._tools.scatterplots import plot_scatter
import seaborn as sns
/usr/local/lib/python3.7/site-packages/dask/config.py:161: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  data = yaml.load(f.read()) or {}

set variables for file paths to read from and write to:

In [32]:
# set a working directory
wdir = "/mnt/beegfs/data/users/vsc32528/pbmc10kprotocol/"
wdir = '/ddn1/vol1/staging/leuven/stg_00002/lcb/cflerin/testruns/scenic-protocol/releasetest2/'
os.chdir( wdir )


# path to loom output, generated from a combination of Scanpy and pySCENIC results:
f_final_loom = 'PBMC10k.loom'
In [3]:
sc.settings.verbosity = 3 # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_versions()
sc.settings.set_figure_params(dpi=150)
scanpy==1.4.4.post1 anndata==0.6.22 umap==0.3.9 numpy==1.16.2 scipy==1.3.0 pandas==0.23.4 scikit-learn==0.21.2 statsmodels==0.10.0 python-igraph==0.7.1 louvain==0.6.1

Extract relevant data from the integrated loom file

In [4]:
# scenic output
lf = lp.connect( f_final_loom, mode='r', validate=False )
meta = json.loads(zlib.decompress(base64.b64decode( lf.attrs.MetaData )))
exprMat = pd.DataFrame( lf[:,:], index=lf.ra.Gene, columns=lf.ca.CellID).T
auc_mtx = pd.DataFrame( lf.ca.RegulonsAUC, index=lf.ca.CellID)
In [5]:
# create a dictionary of regulons:
regulons = {}
for i,r in pd.DataFrame(lf.ra.Regulons,index=lf.ra.Gene).iteritems():
    regulons[i] =  list(r[r==1].index.values)
In [6]:
# cell annotations from the loom column attributes:
cellAnnot = pd.concat(
    [
        pd.DataFrame( lf.ca.Celltype_Garnett, index=lf.ca.CellID ),
        pd.DataFrame( lf.ca.ClusterID, index=lf.ca.CellID ),
        pd.DataFrame( lf.ca.Louvain_clusters_Scanpy, index=lf.ca.CellID ),
        pd.DataFrame( lf.ca.Percent_mito, index=lf.ca.CellID ),
        pd.DataFrame( lf.ca.nGene, index=lf.ca.CellID ),
        pd.DataFrame( lf.ca.nUMI, index=lf.ca.CellID ),
    ],
    axis=1
)
cellAnnot.columns = [
 'Celltype_Garnett',
 'ClusterID',
 'Louvain_clusters_Scanpy',
 'Percent_mito',
 'nGene',
 'nUMI']
In [7]:
# capture embeddings:
dr = [
    pd.DataFrame( lf.ca.Embedding, index=lf.ca.CellID )
]
dr_names = [
    meta['embeddings'][0]['name'].replace(" ","_")
]

# add other embeddings
drx = pd.DataFrame( lf.ca.Embeddings_X, index=lf.ca.CellID )
dry = pd.DataFrame( lf.ca.Embeddings_Y, index=lf.ca.CellID )

for i in range( len(drx.columns) ):
    dr.append( pd.concat( [ drx.iloc[:,i], dry.iloc[:,i] ], sort=False, axis=1, join='outer' ))
    dr_names.append( meta['embeddings'][i+1]['name'].replace(" ","_").replace('/','-') )

# rename columns:
for i,x in enumerate( dr ):
    x.columns = ['X','Y']
In [8]:
lf.close()

Alternately, we can load this data into a scanpy.AnnData object

This can be done directly from the integrated loom file, with a few modifications to allow for SCENIC- and SCope-specific loom attributes:

In [9]:
adata = sc.read( f_final_loom, validate=False)

# drop the embeddings and extra attributes from the obs object
adata.obs.drop( ['Embedding','Embeddings_X','Embeddings_Y','RegulonsAUC'], axis=1, inplace=True )
--> This might be very slow. Consider passing `cache=True`, which enables much faster reading from a cache file.
In [10]:
# add the embeddings into the adata.obsm object
for i,x in enumerate( dr ):
    adata.obsm[ 'X_'+dr_names[i] ] = x.as_matrix()
/usr/local/bin/ipython:3: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  import re
In [11]:
sc.utils.sanitize_anndata( adata )
... storing 'Celltype_Garnett' as categorical
... storing 'ClusterID' as categorical
... storing 'Clusterings' as categorical
... storing 'Louvain_clusters_Scanpy' as categorical
... storing 'Regulons' as categorical

We can also add all metadata derived from SCENIC to the scanpy.AnnData object.

In [12]:
# load the regulons from a file using the load_signatures function
sig = load_signatures('reg.csv')
add_scenic_metadata(adata, auc_mtx, sig)
Out[12]:
AnnData object with n_obs × n_vars = 10280 × 20292 
    obs: 'Celltype_Garnett', 'ClusterID', 'Clusterings', 'Louvain_clusters_Scanpy', 'Percent_mito', 'nGene', 'nUMI', 'Regulon(AHR_(+))', 'Regulon(AIRE_(+))', 'Regulon(AR_(+))', 'Regulon(ARID3A_(+))', 'Regulon(ARNT_(+))', 'Regulon(ARNTL2_(+))', 'Regulon(ASCL2_(+))', 'Regulon(ATF1_(+))', 'Regulon(ATF3_(+))', 'Regulon(ATF4_(+))', 'Regulon(ATF5_(+))', 'Regulon(ATF6_(+))', 'Regulon(ATF6B_(+))', 'Regulon(ATF7_(+))', 'Regulon(BACH1_(+))', 'Regulon(BACH2_(+))', 'Regulon(BATF_(+))', 'Regulon(BATF3_(+))', 'Regulon(BCL11A_(+))', 'Regulon(BCL3_(+))', 'Regulon(BCLAF1_(+))', 'Regulon(BHLHE40_(+))', 'Regulon(BNC2_(+))', 'Regulon(BRCA1_(+))', 'Regulon(BRF1_(+))', 'Regulon(BRF2_(+))', 'Regulon(CEBPA_(+))', 'Regulon(CEBPB_(+))', 'Regulon(CEBPD_(+))', 'Regulon(CEBPE_(+))', 'Regulon(CEBPG_(+))', 'Regulon(CEBPZ_(+))', 'Regulon(CERS4_(+))', 'Regulon(CHD2_(+))', 'Regulon(CHURC1_(+))', 'Regulon(CLOCK_(+))', 'Regulon(CNOT4_(+))', 'Regulon(CPSF4_(+))', 'Regulon(CREB1_(+))', 'Regulon(CREB3_(+))', 'Regulon(CREB3L2_(+))', 'Regulon(CREB5_(+))', 'Regulon(CREBL2_(+))', 'Regulon(CTCF_(+))', 'Regulon(CUX1_(+))', 'Regulon(CXXC1_(+))', 'Regulon(DBP_(+))', 'Regulon(DDIT3_(+))', 'Regulon(DEAF1_(+))', 'Regulon(DLX2_(+))', 'Regulon(E2F1_(+))', 'Regulon(E2F2_(+))', 'Regulon(E2F3_(+))', 'Regulon(E2F4_(+))', 'Regulon(E2F5_(+))', 'Regulon(E2F6_(+))', 'Regulon(E4F1_(+))', 'Regulon(EBF1_(+))', 'Regulon(EGR1_(+))', 'Regulon(EGR2_(+))', 'Regulon(EGR3_(+))', 'Regulon(EGR4_(+))', 'Regulon(ELF1_(+))', 'Regulon(ELF2_(+))', 'Regulon(ELF3_(+))', 'Regulon(ELF4_(+))', 'Regulon(ELK1_(+))', 'Regulon(ELK3_(+))', 'Regulon(ELK4_(+))', 'Regulon(EOMES_(+))', 'Regulon(EP300_(+))', 'Regulon(ERF_(+))', 'Regulon(ERG_(+))', 'Regulon(ESR1_(+))', 'Regulon(ESR2_(+))', 'Regulon(ESRRA_(+))', 'Regulon(ETS1_(+))', 'Regulon(ETS2_(+))', 'Regulon(ETV2_(+))', 'Regulon(ETV3_(+))', 'Regulon(ETV5_(+))', 'Regulon(ETV6_(+))', 'Regulon(ETV7_(+))', 'Regulon(FIZ1_(+))', 'Regulon(FLI1_(+))', 'Regulon(FOS_(+))', 'Regulon(FOSB_(+))', 'Regulon(FOSL1_(+))', 'Regulon(FOSL2_(+))', 'Regulon(FOXD1_(+))', 'Regulon(FOXD2_(+))', 'Regulon(FOXJ2_(+))', 'Regulon(FOXK2_(+))', 'Regulon(FOXN2_(+))', 'Regulon(FOXN3_(+))', 'Regulon(FOXO1_(+))', 'Regulon(FOXO3_(+))', 'Regulon(FOXP1_(+))', 'Regulon(FOXP3_(+))', 'Regulon(FOXP4_(+))', 'Regulon(GABPA_(+))', 'Regulon(GABPB1_(+))', 'Regulon(GATA1_(+))', 'Regulon(GATA2_(+))', 'Regulon(GATA3_(+))', 'Regulon(GFI1_(+))', 'Regulon(GLI1_(+))', 'Regulon(GRHL1_(+))', 'Regulon(GTF2A2_(+))', 'Regulon(GTF2B_(+))', 'Regulon(GTF3C2_(+))', 'Regulon(HDAC2_(+))', 'Regulon(HES6_(+))', 'Regulon(HES7_(+))', 'Regulon(HESX1_(+))', 'Regulon(HIC1_(+))', 'Regulon(HIC2_(+))', 'Regulon(HIF1A_(+))', 'Regulon(HINFP_(+))', 'Regulon(HIVEP1_(+))', 'Regulon(HMGA1_(+))', 'Regulon(HMGB3_(+))', 'Regulon(HOXA10_(+))', 'Regulon(HOXA5_(+))', 'Regulon(HOXA9_(+))', 'Regulon(HOXB2_(+))', 'Regulon(HOXB3_(+))', 'Regulon(HOXB4_(+))', 'Regulon(HSF1_(+))', 'Regulon(HSF2_(+))', 'Regulon(IKZF1_(+))', 'Regulon(IRF1_(+))', 'Regulon(IRF2_(+))', 'Regulon(IRF3_(+))', 'Regulon(IRF4_(+))', 'Regulon(IRF5_(+))', 'Regulon(IRF7_(+))', 'Regulon(IRF8_(+))', 'Regulon(IRF9_(+))', 'Regulon(IRX3_(+))', 'Regulon(JDP2_(+))', 'Regulon(JUN_(+))', 'Regulon(JUNB_(+))', 'Regulon(JUND_(+))', 'Regulon(KDM2B_(+))', 'Regulon(KDM4A_(+))', 'Regulon(KDM5A_(+))', 'Regulon(KLF11_(+))', 'Regulon(KLF12_(+))', 'Regulon(KLF16_(+))', 'Regulon(KLF2_(+))', 'Regulon(KLF3_(+))', 'Regulon(KLF4_(+))', 'Regulon(KLF5_(+))', 'Regulon(KLF6_(+))', 'Regulon(KLF7_(+))', 'Regulon(KLF8_(+))', 'Regulon(KLF9_(+))', 'Regulon(LEF1_(+))', 'Regulon(MAF_(+))', 'Regulon(MAFB_(+))', 'Regulon(MAFF_(+))', 'Regulon(MAFG_(+))', 'Regulon(MAFK_(+))', 'Regulon(MAX_(+))', 'Regulon(MAZ_(+))', 'Regulon(MECOM_(+))', 'Regulon(MEF2C_(+))', 'Regulon(MEF2D_(+))', 'Regulon(MEOX1_(+))', 'Regulon(MGA_(+))', 'Regulon(MITF_(+))', 'Regulon(MLX_(+))', 'Regulon(MXD3_(+))', 'Regulon(MXD4_(+))', 'Regulon(MXI1_(+))', 'Regulon(MYBL2_(+))', 'Regulon(MYC_(+))', 'Regulon(MYCN_(+))', 'Regulon(MYEF2_(+))', 'Regulon(MZF1_(+))', 'Regulon(NANOS1_(+))', 'Regulon(NFAT5_(+))', 'Regulon(NFATC2_(+))', 'Regulon(NFE2_(+))', 'Regulon(NFE2L1_(+))', 'Regulon(NFE2L2_(+))', 'Regulon(NFE2L3_(+))', 'Regulon(NFIB_(+))', 'Regulon(NFIC_(+))', 'Regulon(NFIL3_(+))', 'Regulon(NFKB1_(+))', 'Regulon(NFKB2_(+))', 'Regulon(NFYA_(+))', 'Regulon(NFYB_(+))', 'Regulon(NHLH1_(+))', 'Regulon(NR1D1_(+))', 'Regulon(NR1H2_(+))', 'Regulon(NR1H3_(+))', 'Regulon(NR1I3_(+))', 'Regulon(NR2C1_(+))', 'Regulon(NR2C2_(+))', 'Regulon(NR2F6_(+))', 'Regulon(NR3C1_(+))', 'Regulon(NR5A1_(+))', 'Regulon(NR6A1_(+))', 'Regulon(NRF1_(+))', 'Regulon(NRL_(+))', 'Regulon(OLIG1_(+))', 'Regulon(OSR2_(+))', 'Regulon(PATZ1_(+))', 'Regulon(PAX2_(+))', 'Regulon(PAX5_(+))', 'Regulon(PBX1_(+))', 'Regulon(PBX2_(+))', 'Regulon(PBX3_(+))', 'Regulon(PHF8_(+))', 'Regulon(PHTF1_(+))', 'Regulon(PIK3C3_(+))', 'Regulon(PLAGL1_(+))', 'Regulon(POLE3_(+))', 'Regulon(POLR2A_(+))', 'Regulon(POLR3G_(+))', 'Regulon(POU2F1_(+))', 'Regulon(POU2F2_(+))', 'Regulon(POU5F1B_(+))', 'Regulon(POU6F1_(+))', 'Regulon(PPARA_(+))', 'Regulon(PPARD_(+))', 'Regulon(PRDM1_(+))', 'Regulon(RAD21_(+))', 'Regulon(RARA_(+))', 'Regulon(RARG_(+))', 'Regulon(RB1_(+))', 'Regulon(RBBP5_(+))', 'Regulon(REL_(+))', 'Regulon(RELA_(+))', 'Regulon(RELB_(+))', 'Regulon(REST_(+))', 'Regulon(RFX1_(+))', 'Regulon(RFX2_(+))', 'Regulon(RFX3_(+))', 'Regulon(RFX5_(+))', 'Regulon(RFX7_(+))', 'Regulon(RORA_(+))', 'Regulon(RORB_(+))', 'Regulon(RORC_(+))', 'Regulon(RUNX1_(+))', 'Regulon(RUNX2_(+))', 'Regulon(RUNX3_(+))', 'Regulon(RXRA_(+))', 'Regulon(RXRB_(+))', 'Regulon(SF1_(+))', 'Regulon(SIN3A_(+))', 'Regulon(SMAD3_(+))', 'Regulon(SMARCC2_(+))', 'Regulon(SMC3_(+))', 'Regulon(SNAI1_(+))', 'Regulon(SOCS4_(+))', 'Regulon(SOX15_(+))', 'Regulon(SOX8_(+))', 'Regulon(SP1_(+))', 'Regulon(SP2_(+))', 'Regulon(SP3_(+))', 'Regulon(SP4_(+))', 'Regulon(SPI1_(+))', 'Regulon(SPIB_(+))', 'Regulon(SPIC_(+))', 'Regulon(SREBF1_(+))', 'Regulon(SREBF2_(+))', 'Regulon(SRF_(+))', 'Regulon(STAT1_(+))', 'Regulon(STAT2_(+))', 'Regulon(STAT3_(+))', 'Regulon(STAT5A_(+))', 'Regulon(STAT5B_(+))', 'Regulon(STAT6_(+))', 'Regulon(TAF1_(+))', 'Regulon(TAF7_(+))', 'Regulon(TAL1_(+))', 'Regulon(TBP_(+))', 'Regulon(TBPL1_(+))', 'Regulon(TBX19_(+))', 'Regulon(TBX21_(+))', 'Regulon(TBX6_(+))', 'Regulon(TCF12_(+))', 'Regulon(TCF3_(+))', 'Regulon(TCF7_(+))', 'Regulon(TCF7L1_(+))', 'Regulon(TCF7L2_(+))', 'Regulon(TEAD3_(+))', 'Regulon(TEF_(+))', 'Regulon(TFAP4_(+))', 'Regulon(TFCP2L1_(+))', 'Regulon(TFDP1_(+))', 'Regulon(TFE3_(+))', 'Regulon(TFEB_(+))', 'Regulon(TFEC_(+))', 'Regulon(TGIF2_(+))', 'Regulon(THAP1_(+))', 'Regulon(THRA_(+))', 'Regulon(TOB2_(+))', 'Regulon(TP53_(+))', 'Regulon(TP63_(+))', 'Regulon(TRIM28_(+))', 'Regulon(UBTF_(+))', 'Regulon(USF1_(+))', 'Regulon(VEZF1_(+))', 'Regulon(VPS4B_(+))', 'Regulon(XBP1_(+))', 'Regulon(YEATS4_(+))', 'Regulon(YY1_(+))', 'Regulon(YY2_(+))', 'Regulon(ZBTB14_(+))', 'Regulon(ZBTB18_(+))', 'Regulon(ZBTB33_(+))', 'Regulon(ZBTB41_(+))', 'Regulon(ZBTB45_(+))', 'Regulon(ZBTB7A_(+))', 'Regulon(ZBTB7B_(+))', 'Regulon(ZEB1_(+))', 'Regulon(ZFHX3_(+))', 'Regulon(ZFP1_(+))', 'Regulon(ZFP3_(+))', 'Regulon(ZFX_(+))', 'Regulon(ZFY_(+))', 'Regulon(ZKSCAN3_(+))', 'Regulon(ZKSCAN4_(+))', 'Regulon(ZKSCAN8_(+))', 'Regulon(ZNF101_(+))', 'Regulon(ZNF124_(+))', 'Regulon(ZNF135_(+))', 'Regulon(ZNF138_(+))', 'Regulon(ZNF143_(+))', 'Regulon(ZNF16_(+))', 'Regulon(ZNF260_(+))', 'Regulon(ZNF263_(+))', 'Regulon(ZNF266_(+))', 'Regulon(ZNF267_(+))', 'Regulon(ZNF274_(+))', 'Regulon(ZNF283_(+))', 'Regulon(ZNF30_(+))', 'Regulon(ZNF304_(+))', 'Regulon(ZNF362_(+))', 'Regulon(ZNF384_(+))', 'Regulon(ZNF407_(+))', 'Regulon(ZNF410_(+))', 'Regulon(ZNF429_(+))', 'Regulon(ZNF470_(+))', 'Regulon(ZNF471_(+))', 'Regulon(ZNF484_(+))', 'Regulon(ZNF501_(+))', 'Regulon(ZNF513_(+))', 'Regulon(ZNF524_(+))', 'Regulon(ZNF549_(+))', 'Regulon(ZNF550_(+))', 'Regulon(ZNF555_(+))', 'Regulon(ZNF568_(+))', 'Regulon(ZNF573_(+))', 'Regulon(ZNF578_(+))', 'Regulon(ZNF595_(+))', 'Regulon(ZNF607_(+))', 'Regulon(ZNF667_(+))', 'Regulon(ZNF709_(+))', 'Regulon(ZNF71_(+))', 'Regulon(ZNF721_(+))', 'Regulon(ZNF76_(+))', 'Regulon(ZNF768_(+))', 'Regulon(ZNF81_(+))', 'Regulon(ZNF831_(+))', 'Regulon(ZNF836_(+))', 'Regulon(ZNF891_(+))', 'Regulon(ZNF91_(+))', 'Regulon(ZSCAN2_(+))', 'Regulon(ZSCAN31_(+))'
    var: 'Regulons', 'Regulon(AHR(+))', 'Regulon(AIRE(+))', 'Regulon(AR(+))', 'Regulon(ARID3A(+))', 'Regulon(ARNT(+))', 'Regulon(ARNTL2(+))', 'Regulon(ASCL2(+))', 'Regulon(ATF1(+))', 'Regulon(ATF3(+))', 'Regulon(ATF4(+))', 'Regulon(ATF5(+))', 'Regulon(ATF6(+))', 'Regulon(ATF6B(+))', 'Regulon(ATF7(+))', 'Regulon(BACH1(+))', 'Regulon(BACH2(+))', 'Regulon(BATF(+))', 'Regulon(BATF3(+))', 'Regulon(BCL11A(+))', 'Regulon(BCL3(+))', 'Regulon(BCLAF1(+))', 'Regulon(BHLHE40(+))', 'Regulon(BNC2(+))', 'Regulon(BRCA1(+))', 'Regulon(BRF1(+))', 'Regulon(BRF2(+))', 'Regulon(CEBPA(+))', 'Regulon(CEBPB(+))', 'Regulon(CEBPD(+))', 'Regulon(CEBPE(+))', 'Regulon(CEBPG(+))', 'Regulon(CEBPZ(+))', 'Regulon(CERS4(+))', 'Regulon(CHD2(+))', 'Regulon(CHURC1(+))', 'Regulon(CLOCK(+))', 'Regulon(CNOT4(+))', 'Regulon(CPSF4(+))', 'Regulon(CREB1(+))', 'Regulon(CREB3(+))', 'Regulon(CREB3L2(+))', 'Regulon(CREB5(+))', 'Regulon(CREBL2(+))', 'Regulon(CTCF(+))', 'Regulon(CUX1(+))', 'Regulon(CXXC1(+))', 'Regulon(DBP(+))', 'Regulon(DDIT3(+))', 'Regulon(DEAF1(+))', 'Regulon(DLX2(+))', 'Regulon(E2F1(+))', 'Regulon(E2F2(+))', 'Regulon(E2F3(+))', 'Regulon(E2F4(+))', 'Regulon(E2F5(+))', 'Regulon(E2F6(+))', 'Regulon(E4F1(+))', 'Regulon(EBF1(+))', 'Regulon(EGR1(+))', 'Regulon(EGR2(+))', 'Regulon(EGR3(+))', 'Regulon(EGR4(+))', 'Regulon(ELF1(+))', 'Regulon(ELF2(+))', 'Regulon(ELF3(+))', 'Regulon(ELF4(+))', 'Regulon(ELK1(+))', 'Regulon(ELK3(+))', 'Regulon(ELK4(+))', 'Regulon(EOMES(+))', 'Regulon(EP300(+))', 'Regulon(ERF(+))', 'Regulon(ERG(+))', 'Regulon(ESR1(+))', 'Regulon(ESR2(+))', 'Regulon(ESRRA(+))', 'Regulon(ETS1(+))', 'Regulon(ETS2(+))', 'Regulon(ETV2(+))', 'Regulon(ETV3(+))', 'Regulon(ETV5(+))', 'Regulon(ETV6(+))', 'Regulon(ETV7(+))', 'Regulon(FIZ1(+))', 'Regulon(FLI1(+))', 'Regulon(FOS(+))', 'Regulon(FOSB(+))', 'Regulon(FOSL1(+))', 'Regulon(FOSL2(+))', 'Regulon(FOXD1(+))', 'Regulon(FOXD2(+))', 'Regulon(FOXJ2(+))', 'Regulon(FOXK2(+))', 'Regulon(FOXN2(+))', 'Regulon(FOXN3(+))', 'Regulon(FOXO1(+))', 'Regulon(FOXO3(+))', 'Regulon(FOXP1(+))', 'Regulon(FOXP3(+))', 'Regulon(FOXP4(+))', 'Regulon(GABPA(+))', 'Regulon(GABPB1(+))', 'Regulon(GATA1(+))', 'Regulon(GATA2(+))', 'Regulon(GATA3(+))', 'Regulon(GFI1(+))', 'Regulon(GLI1(+))', 'Regulon(GRHL1(+))', 'Regulon(GTF2A2(+))', 'Regulon(GTF2B(+))', 'Regulon(GTF3C2(+))', 'Regulon(HDAC2(+))', 'Regulon(HES6(+))', 'Regulon(HES7(+))', 'Regulon(HESX1(+))', 'Regulon(HIC1(+))', 'Regulon(HIC2(+))', 'Regulon(HIF1A(+))', 'Regulon(HINFP(+))', 'Regulon(HIVEP1(+))', 'Regulon(HMGA1(+))', 'Regulon(HMGB3(+))', 'Regulon(HOXA10(+))', 'Regulon(HOXA5(+))', 'Regulon(HOXA9(+))', 'Regulon(HOXB2(+))', 'Regulon(HOXB3(+))', 'Regulon(HOXB4(+))', 'Regulon(HSF1(+))', 'Regulon(HSF2(+))', 'Regulon(IKZF1(+))', 'Regulon(IRF1(+))', 'Regulon(IRF2(+))', 'Regulon(IRF3(+))', 'Regulon(IRF4(+))', 'Regulon(IRF5(+))', 'Regulon(IRF7(+))', 'Regulon(IRF8(+))', 'Regulon(IRF9(+))', 'Regulon(IRX3(+))', 'Regulon(JDP2(+))', 'Regulon(JUN(+))', 'Regulon(JUNB(+))', 'Regulon(JUND(+))', 'Regulon(KDM2B(+))', 'Regulon(KDM4A(+))', 'Regulon(KDM5A(+))', 'Regulon(KLF11(+))', 'Regulon(KLF12(+))', 'Regulon(KLF16(+))', 'Regulon(KLF2(+))', 'Regulon(KLF3(+))', 'Regulon(KLF4(+))', 'Regulon(KLF5(+))', 'Regulon(KLF6(+))', 'Regulon(KLF7(+))', 'Regulon(KLF8(+))', 'Regulon(KLF9(+))', 'Regulon(LEF1(+))', 'Regulon(MAF(+))', 'Regulon(MAFB(+))', 'Regulon(MAFF(+))', 'Regulon(MAFG(+))', 'Regulon(MAFK(+))', 'Regulon(MAX(+))', 'Regulon(MAZ(+))', 'Regulon(MECOM(+))', 'Regulon(MEF2C(+))', 'Regulon(MEF2D(+))', 'Regulon(MEOX1(+))', 'Regulon(MGA(+))', 'Regulon(MITF(+))', 'Regulon(MLX(+))', 'Regulon(MXD3(+))', 'Regulon(MXD4(+))', 'Regulon(MXI1(+))', 'Regulon(MYBL2(+))', 'Regulon(MYC(+))', 'Regulon(MYCN(+))', 'Regulon(MYEF2(+))', 'Regulon(MZF1(+))', 'Regulon(NANOS1(+))', 'Regulon(NFAT5(+))', 'Regulon(NFATC2(+))', 'Regulon(NFE2(+))', 'Regulon(NFE2L1(+))', 'Regulon(NFE2L2(+))', 'Regulon(NFE2L3(+))', 'Regulon(NFIB(+))', 'Regulon(NFIC(+))', 'Regulon(NFIL3(+))', 'Regulon(NFKB1(+))', 'Regulon(NFKB2(+))', 'Regulon(NFYA(+))', 'Regulon(NFYB(+))', 'Regulon(NHLH1(+))', 'Regulon(NR1D1(+))', 'Regulon(NR1H2(+))', 'Regulon(NR1H3(+))', 'Regulon(NR1I3(+))', 'Regulon(NR2C1(+))', 'Regulon(NR2C2(+))', 'Regulon(NR2F6(+))', 'Regulon(NR3C1(+))', 'Regulon(NR5A1(+))', 'Regulon(NR6A1(+))', 'Regulon(NRF1(+))', 'Regulon(NRL(+))', 'Regulon(OLIG1(+))', 'Regulon(OSR2(+))', 'Regulon(PATZ1(+))', 'Regulon(PAX2(+))', 'Regulon(PAX5(+))', 'Regulon(PBX1(+))', 'Regulon(PBX2(+))', 'Regulon(PBX3(+))', 'Regulon(PHF8(+))', 'Regulon(PHTF1(+))', 'Regulon(PIK3C3(+))', 'Regulon(PLAGL1(+))', 'Regulon(POLE3(+))', 'Regulon(POLR2A(+))', 'Regulon(POLR3G(+))', 'Regulon(POU2F1(+))', 'Regulon(POU2F2(+))', 'Regulon(POU5F1B(+))', 'Regulon(POU6F1(+))', 'Regulon(PPARA(+))', 'Regulon(PPARD(+))', 'Regulon(PRDM1(+))', 'Regulon(RAD21(+))', 'Regulon(RARA(+))', 'Regulon(RARG(+))', 'Regulon(RB1(+))', 'Regulon(RBBP5(+))', 'Regulon(REL(+))', 'Regulon(RELA(+))', 'Regulon(RELB(+))', 'Regulon(REST(+))', 'Regulon(RFX1(+))', 'Regulon(RFX2(+))', 'Regulon(RFX3(+))', 'Regulon(RFX5(+))', 'Regulon(RFX7(+))', 'Regulon(RORA(+))', 'Regulon(RORB(+))', 'Regulon(RORC(+))', 'Regulon(RUNX1(+))', 'Regulon(RUNX2(+))', 'Regulon(RUNX3(+))', 'Regulon(RXRA(+))', 'Regulon(RXRB(+))', 'Regulon(SF1(+))', 'Regulon(SIN3A(+))', 'Regulon(SMAD3(+))', 'Regulon(SMARCC2(+))', 'Regulon(SMC3(+))', 'Regulon(SNAI1(+))', 'Regulon(SOCS4(+))', 'Regulon(SOX15(+))', 'Regulon(SOX8(+))', 'Regulon(SP1(+))', 'Regulon(SP2(+))', 'Regulon(SP3(+))', 'Regulon(SP4(+))', 'Regulon(SPI1(+))', 'Regulon(SPIB(+))', 'Regulon(SPIC(+))', 'Regulon(SREBF1(+))', 'Regulon(SREBF2(+))', 'Regulon(SRF(+))', 'Regulon(STAT1(+))', 'Regulon(STAT2(+))', 'Regulon(STAT3(+))', 'Regulon(STAT5A(+))', 'Regulon(STAT5B(+))', 'Regulon(STAT6(+))', 'Regulon(TAF1(+))', 'Regulon(TAF7(+))', 'Regulon(TAL1(+))', 'Regulon(TBP(+))', 'Regulon(TBPL1(+))', 'Regulon(TBX19(+))', 'Regulon(TBX21(+))', 'Regulon(TBX6(+))', 'Regulon(TCF12(+))', 'Regulon(TCF3(+))', 'Regulon(TCF7(+))', 'Regulon(TCF7L1(+))', 'Regulon(TCF7L2(+))', 'Regulon(TEAD3(+))', 'Regulon(TEF(+))', 'Regulon(TFAP4(+))', 'Regulon(TFCP2L1(+))', 'Regulon(TFDP1(+))', 'Regulon(TFE3(+))', 'Regulon(TFEB(+))', 'Regulon(TFEC(+))', 'Regulon(TGIF2(+))', 'Regulon(THAP1(+))', 'Regulon(THRA(+))', 'Regulon(TOB2(+))', 'Regulon(TP53(+))', 'Regulon(TP63(+))', 'Regulon(TRIM28(+))', 'Regulon(UBTF(+))', 'Regulon(USF1(+))', 'Regulon(VEZF1(+))', 'Regulon(VPS4B(+))', 'Regulon(XBP1(+))', 'Regulon(YEATS4(+))', 'Regulon(YY1(+))', 'Regulon(YY2(+))', 'Regulon(ZBTB14(+))', 'Regulon(ZBTB18(+))', 'Regulon(ZBTB33(+))', 'Regulon(ZBTB41(+))', 'Regulon(ZBTB45(+))', 'Regulon(ZBTB7A(+))', 'Regulon(ZBTB7B(+))', 'Regulon(ZEB1(+))', 'Regulon(ZFHX3(+))', 'Regulon(ZFP1(+))', 'Regulon(ZFP3(+))', 'Regulon(ZFX(+))', 'Regulon(ZFY(+))', 'Regulon(ZKSCAN3(+))', 'Regulon(ZKSCAN4(+))', 'Regulon(ZKSCAN8(+))', 'Regulon(ZNF101(+))', 'Regulon(ZNF124(+))', 'Regulon(ZNF135(+))', 'Regulon(ZNF138(+))', 'Regulon(ZNF143(+))', 'Regulon(ZNF16(+))', 'Regulon(ZNF260(+))', 'Regulon(ZNF263(+))', 'Regulon(ZNF266(+))', 'Regulon(ZNF267(+))', 'Regulon(ZNF274(+))', 'Regulon(ZNF283(+))', 'Regulon(ZNF30(+))', 'Regulon(ZNF304(+))', 'Regulon(ZNF362(+))', 'Regulon(ZNF384(+))', 'Regulon(ZNF407(+))', 'Regulon(ZNF410(+))', 'Regulon(ZNF429(+))', 'Regulon(ZNF470(+))', 'Regulon(ZNF471(+))', 'Regulon(ZNF484(+))', 'Regulon(ZNF501(+))', 'Regulon(ZNF513(+))', 'Regulon(ZNF524(+))', 'Regulon(ZNF549(+))', 'Regulon(ZNF550(+))', 'Regulon(ZNF555(+))', 'Regulon(ZNF568(+))', 'Regulon(ZNF573(+))', 'Regulon(ZNF578(+))', 'Regulon(ZNF595(+))', 'Regulon(ZNF607(+))', 'Regulon(ZNF667(+))', 'Regulon(ZNF709(+))', 'Regulon(ZNF71(+))', 'Regulon(ZNF721(+))', 'Regulon(ZNF76(+))', 'Regulon(ZNF768(+))', 'Regulon(ZNF81(+))', 'Regulon(ZNF831(+))', 'Regulon(ZNF836(+))', 'Regulon(ZNF891(+))', 'Regulon(ZNF91(+))', 'Regulon(ZSCAN2(+))', 'Regulon(ZSCAN31(+))'
    uns: 'aucell'
    obsm: 'X_highly_variable_genes_UMAP', 'X_highly_variable_genes_t-SNE', 'X_Scanpy_PC1-PC2', 'X_SCENIC_AUC_t-SNE', 'X_SCENIC_AUC_UMAP', 'X_aucell'

Display a motifs table with motif logos

View the motifs table along with motif logos

In [12]:
# helper functions (not yet integrated into pySCENIC):

from pyscenic.utils import load_motifs
import operator as op
from IPython.display import HTML, display

BASE_URL = "http://motifcollections.aertslab.org/v9/logos/"
COLUMN_NAME_LOGO = "MotifLogo"
COLUMN_NAME_MOTIF_ID = "MotifID"
COLUMN_NAME_TARGETS = "TargetGenes"

def display_logos(df: pd.DataFrame, top_target_genes: int = 3, base_url: str = BASE_URL):
    """
    :param df:
    :param base_url:
    """
    # Make sure the original dataframe is not altered.
    df = df.copy()
    
    # Add column with URLs to sequence logo.
    def create_url(motif_id):
        return '<img src="{}{}.png" style="max-height:124px;"></img>'.format(base_url, motif_id)
    df[("Enrichment", COLUMN_NAME_LOGO)] = list(map(create_url, df.index.get_level_values(COLUMN_NAME_MOTIF_ID)))
    
    # Truncate TargetGenes.
    def truncate(col_val):
        return sorted(col_val, key=op.itemgetter(1))[:top_target_genes]
    df[("Enrichment", COLUMN_NAME_TARGETS)] = list(map(truncate, df[("Enrichment", COLUMN_NAME_TARGETS)]))
    
    MAX_COL_WIDTH = pd.get_option('display.max_colwidth')
    pd.set_option('display.max_colwidth', 200)
    display(HTML(df.head().to_html(escape=False)))
    pd.set_option('display.max_colwidth', MAX_COL_WIDTH)
In [13]:
df_motifs = load_motifs('reg.csv')
In [14]:
selected_motifs = ['PAX5','TCF3','EBF1']
df_motifs_sel = df_motifs.iloc[ [ True if x in selected_motifs else False for x in df_motifs.index.get_level_values('TF') ] ,:]
In [15]:
#display_logos(df_motifs.head())
display_logos( df_motifs_sel.sort_values([('Enrichment','NES')], ascending=False).head(9))
Enrichment
AUC Annotation Context MotifSimilarityQvalue NES OrthologousIdentity RankAtMax TargetGenes MotifLogo
TF MotifID
EBF1 transfac_pro__M07351 0.153960 gene is directly annotated (hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr, top50, activating) 0.000000 6.271401 1.0 612 [(FAM129C, 10.255336830422989), (CD79A, 10.822303190915058), (PAX5, 12.393443190322412)]
factorbook__EBF1 0.138138 gene is directly annotated (hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr, top50, activating) 0.000000 5.476125 1.0 756 [(CD79A, 10.255336830422989), (FAM129C, 10.822303190915058), (P2RY10, 12.393443190322412)]
PAX5 transfac_pro__M04711 0.142637 gene is annotated for similar motif dbcorrdb__PAX5__ENCSR000BHJ_1__m2 ('PAX5 (ENCSR000BHJ-1, motif 2)'; q-value = 0.000317) (hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr, top50, activating) 0.000317 5.273931 1.0 292 [(CNR2, 25.280202022610478), (KIAA0040, 27.735438642040428), (STAP1, 32.35398721525476)]
cisbp__M1907 0.141739 gene is annotated for similar motif dbcorrdb__PAX5__ENCSR000BHJ_1__m2 ('PAX5 (ENCSR000BHJ-1, motif 2)'; q-value = 0.000547) (hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr, top50, activating) 0.000547 5.230762 1.0 2490 [(CNR2, 23.63296995309573), (E2F5, 24.70069302305977), (CD37, 25.280202022610478)]
dbcorrdb__SPI1__ENCSR000BGW_1__m1 0.140306 gene is annotated for similar motif dbcorrdb__PAX5__ENCSR000BHJ_1__m2 ('PAX5 (ENCSR000BHJ-1, motif 2)'; q-value = 0.000405) (hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr, top50, activating) 0.000405 5.161844 1.0 1596 [(ARHGAP24, 24.70069302305977), (CD22, 25.280202022610478), (BIRC3, 27.735438642040428)]

Dimensionality reduction plots

Show both highly variable genes and SCENIC UMAP plots with Louvain clustering and cell type classifications

In [34]:
sc.set_figure_params(frameon=False, dpi=600, fontsize=10, dpi_save=600)

sc.pl.scatter( adata, basis='highly_variable_genes_UMAP', 
    color=['Louvain_clusters_Scanpy','Celltype_Garnett'],
    title=['HVG - UMAP (Louvain clusters)','HVG - UMAP (Cell type)'],
    alpha=0.8,
    save='_Louvain-celltype.pdf'
    )

sc.pl.scatter( adata, basis='SCENIC_AUC_UMAP', 
    color=['Louvain_clusters_Scanpy','Celltype_Garnett'],
    title=['SCENIC - UMAP (Louvain clusters)','SCENIC - UMAP (Cell type)'], 
    alpha=0.8,
    save='_Louvain-celltype.pdf'
    )
WARNING: saving figure to file figures/highly_variable_genes_UMAP_Louvain-celltype.pdf
WARNING: saving figure to file figures/SCENIC_AUC_UMAP_Louvain-celltype.pdf

Alternately, we can plot two dimensionality reductions side-by-side

(this uses non-Scanpy plotting functions)

In [35]:
def colorMap( x, palette='bright' ):
    import natsort
    from collections import OrderedDict
    #
    n=len(set(x))
    cpalette = sns.color_palette(palette,n_colors=n )
    cdict = dict( zip( list(set(x)), cpalette ))
    cmap = [ cdict[i] for i in x ]
    cdict = OrderedDict( natsort.natsorted(cdict.items()) )
    return cmap,cdict

def drplot( dr, colorlab, ax, palette='bright', title=None, **kwargs ):
    cmap,cdict = colorMap( colorlab, palette )
    for lab,col in cdict.items():  
        ix = colorlab.loc[colorlab==lab].index
        ax.scatter( dr['X'][ix], dr['Y'][ix], c=[col]*len(ix), alpha=0.7, label=lab, edgecolors='none')
    if( title is not None ):
        ax.set_title(title, fontsize='x-large');
    #
    ax.set_xticks([])
    ax.set_yticks([])
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['bottom'].set_visible(False)
    ax.spines['left'].set_visible(False)
In [36]:
plt.rcParams.update({'font.size':12})

fig, (ax1,ax2) = plt.subplots(1,2, figsize=(20,10), dpi=150 )

drplot( dr[0], colorlab=cellAnnot['Louvain_clusters_Scanpy'], ax=ax1, palette='bright', s=2, title='Highly variable genes - UMAP' )

drplot( dr[4], colorlab=cellAnnot['Louvain_clusters_Scanpy'], ax=ax2, palette='bright', s=2, title='SCENIC AUC - UMAP' )
ax2.legend(loc='right', bbox_to_anchor=(1.15, 0.5), ncol=1, markerscale=2, fontsize='x-large', frameon=False, title="Louvain\nclusters")

plt.tight_layout()
plt.savefig("PBMC10k_dimred_umap-hvg-scenic-louvain.pdf", dpi=600, bbox_inches = "tight")
---

Regulon specificity scores (RSS) across predicted cell types

In [21]:
from pyscenic.rss import regulon_specificity_scores
from pyscenic.plotting import plot_rss
import matplotlib.pyplot as plt
from adjustText import adjust_text
import seaborn as sns
from pyscenic.binarization import binarize

Calculate RSS

In [22]:
rss_cellType = regulon_specificity_scores( auc_mtx, cellAnnot['Celltype_Garnett'] )
rss_cellType
Out[22]:
AHR_(+) AIRE_(+) AR_(+) ARID3A_(+) ARNT_(+) ARNTL2_(+) ASCL2_(+) ATF1_(+) ATF3_(+) ATF4_(+) ... ZNF721_(+) ZNF76_(+) ZNF768_(+) ZNF81_(+) ZNF831_(+) ZNF836_(+) ZNF891_(+) ZNF91_(+) ZSCAN2_(+) ZSCAN31_(+)
CD4 T cells 0.366769 0.281938 0.189307 0.378747 0.393229 0.396066 0.199733 0.436353 0.377191 0.424604 ... 0.393531 0.435212 0.412642 0.403521 0.494977 0.192219 0.316267 0.461121 0.323386 0.250592
Monocytes 0.504444 0.171344 0.187820 0.445404 0.424125 0.344595 0.172537 0.376009 0.491291 0.393613 ... 0.263444 0.345634 0.400087 0.247531 0.241102 0.176174 0.222977 0.314697 0.270500 0.211815
NK cells 0.232990 0.179438 0.183765 0.248538 0.239691 0.249187 0.437472 0.248930 0.242318 0.235807 ... 0.208188 0.249632 0.243524 0.211654 0.224851 0.338831 0.199025 0.234754 0.216309 0.201620
B cells 0.286316 0.183078 0.181711 0.319743 0.288825 0.338598 0.184019 0.307793 0.279499 0.306627 ... 0.420080 0.323780 0.311000 0.372639 0.296510 0.169903 0.358619 0.323047 0.341282 0.286411
Unknown 0.254864 0.215850 0.184773 0.262545 0.261489 0.272693 0.231676 0.274696 0.259538 0.273263 ... 0.261413 0.275931 0.269088 0.267352 0.283002 0.204710 0.235159 0.277804 0.251780 0.217092
T cells 0.225652 0.212413 0.175600 0.230627 0.233498 0.234690 0.198319 0.248548 0.228995 0.242450 ... 0.241701 0.247405 0.239770 0.243832 0.265348 0.183213 0.226798 0.255984 0.227318 0.208629
Dendritic cells 0.188028 0.168784 0.366005 0.194536 0.183012 0.191987 0.175903 0.186550 0.186537 0.191970 ... 0.180408 0.185262 0.184354 0.181729 0.175224 0.183004 0.173006 0.186041 0.177321 0.186952
CD8 T cells 0.191321 0.204218 0.174881 0.196257 0.196711 0.200369 0.202929 0.204147 0.193793 0.199082 ... 0.194056 0.202665 0.198053 0.195946 0.209150 0.180606 0.195803 0.205315 0.191795 0.194565
CD34+ 0.171082 0.170337 0.172723 0.171391 0.171660 0.171696 0.169503 0.171650 0.170838 0.172520 ... 0.172552 0.171332 0.170162 0.171652 0.169781 0.176347 0.169294 0.171669 0.170309 0.168696

9 rows × 375 columns

RSS panel plot with all cell types

In [37]:
cats = sorted(list(set(cellAnnot['Celltype_Garnett'])))

fig = plt.figure(figsize=(15, 8))
for c,num in zip(cats, range(1,len(cats)+1)):
    x=rss_cellType.T[c]
    ax = fig.add_subplot(2,5,num)
    plot_rss(rss_cellType, c, top_n=5, max_n=None, ax=ax)
    ax.set_ylim( x.min()-(x.max()-x.min())*0.05 , x.max()+(x.max()-x.min())*0.05 )
    for t in ax.texts:
        t.set_fontsize(12)
    ax.set_ylabel('')
    ax.set_xlabel('')
    adjust_text(ax.texts, autoalign='xy', ha='right', arrowprops=dict(arrowstyle='-',color='lightgrey'), precision=0.001 )
 
fig.text(0.5, 0.0, 'Regulon', ha='center', va='center', size='x-large')
fig.text(0.00, 0.5, 'Regulon specificity score (RSS)', ha='center', va='center', rotation='vertical', size='x-large')
plt.tight_layout()
plt.rcParams.update({
    'figure.autolayout': True,
        'figure.titlesize': 'large' ,
        #'axes.labelsize': 'x-large',
        #'axes.titlesize':'x-large',
        'xtick.labelsize':'large',
        'ytick.labelsize':'large'
        })
#plt.savefig("PBMC10k_cellType-RSS-top5.png", dpi=150, bbox_inches = "tight")
plt.savefig("PBMC10k_cellType-RSS-top5.pdf", dpi=600, bbox_inches = "tight")
plt.show()

Select the top 5 regulons from each cell type

In [38]:
topreg = []
for i,c in enumerate(cats):
    topreg.extend(
        list(rss_cellType.T[c].sort_values(ascending=False)[:5].index)
    )
topreg = list(set(topreg))

Generate a Z-score for each regulon to enable comparison between regulons

In [39]:
auc_mtx_Z = pd.DataFrame( index=auc_mtx.index )
for col in list(auc_mtx.columns):
    auc_mtx_Z[ col ] = ( auc_mtx[col] - auc_mtx[col].mean()) / auc_mtx[col].std(ddof=0)
#auc_mtx_Z.sort_index(inplace=True)

Generate a heatmap

In [40]:
def palplot(pal, names, colors=None, size=1):
    n = len(pal)
    f, ax = plt.subplots(1, 1, figsize=(n * size, size))
    ax.imshow(np.arange(n).reshape(1, n),
              cmap=mpl.colors.ListedColormap(list(pal)),
              interpolation="nearest", aspect="auto")
    ax.set_xticks(np.arange(n) - .5)
    ax.set_yticks([-.5, .5])
    ax.set_xticklabels([])
    ax.set_yticklabels([])
    colors = n * ['k'] if colors is None else colors
    for idx, (name, color) in enumerate(zip(names, colors)):
        ax.text(0.0+idx, 0.0, name, color=color, horizontalalignment='center', verticalalignment='center')
    return f
In [41]:
colors = sns.color_palette('bright',n_colors=len(cats) )
colorsd = dict( zip( cats, colors ))
colormap = [ colorsd[x] for x in cellAnnot['Celltype_Garnett'] ]
In [42]:
sns.set()
sns.set(font_scale=0.8)
fig = palplot( colors, cats, size=1.0)
plt.savefig("PBMC10k_cellType-heatmap-legend-top5.pdf", dpi=600, bbox_inches = "tight")
In [46]:
g = sns.clustermap(auc_mtx_Z[topreg].T, annot=False,  square=False,  linecolor='gray',
    yticklabels=True, xticklabels=False, vmin=-2, vmax=6, row_colors=colormap,
    cmap="YlGnBu", figsize=(21,16) )
sns.set(font_scale=0.9)
g.cax.set_visible(True)
g.ax_heatmap.set_ylabel('')
g.ax_heatmap.set_xlabel('')
plt.savefig("PBMC10k_cellType-heatmap-top5.pdf", dpi=600, bbox_inches = "tight")

Generate a binary regulon activity matrix:

In [47]:
binary_mtx, auc_thresholds = binarize( auc_mtx, num_workers=25 )
binary_mtx.head()
Out[47]:
AHR_(+) AIRE_(+) AR_(+) ARID3A_(+) ARNT_(+) ARNTL2_(+) ASCL2_(+) ATF1_(+) ATF3_(+) ATF4_(+) ... ZNF721_(+) ZNF76_(+) ZNF768_(+) ZNF81_(+) ZNF831_(+) ZNF836_(+) ZNF891_(+) ZNF91_(+) ZSCAN2_(+) ZSCAN31_(+)
AAACCCAAGCGCCCAT-1 0 0 0 0 1 0 0 0 0 0 ... 0 0 0 1 1 1 0 0 1 0
AAACCCACAGAGTTGG-1 1 0 0 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 0
AAACCCACAGGTATGG-1 0 0 0 0 0 0 1 1 0 0 ... 0 0 0 0 1 0 0 0 1 0
AAACCCACATAGTCAC-1 0 0 0 0 0 0 0 0 0 0 ... 1 0 0 1 1 0 1 0 0 0
AAACCCACATCCAATG-1 0 0 0 0 0 1 1 0 0 0 ... 0 0 0 1 1 0 0 0 0 0

5 rows × 375 columns

Show the AUC distributions for selected regulons

In [48]:
# select regulons:
r = [ 'RXRA_(+)', 'NFE2_(+)', 'ETV2_(+)' ]

fig, axs = plt.subplots(1, 3, figsize=(16, 5), dpi=150, sharey=False)
for i,ax in enumerate(axs):
    sns.distplot(auc_mtx[ r[i] ], ax=ax, norm_hist=True, bins=100)
    ax.plot( [ auc_thresholds[ r[i] ] ]*2, ax.get_ylim(), 'r:')
    ax.title.set_text( r[i] )
    ax.set_xlabel('')
    
fig.text(0.00, 0.5, 'Frequency', ha='center', va='center', rotation='vertical', size='x-large')
fig.text(0.5, 0.0, 'AUC', ha='center', va='center', rotation='horizontal', size='x-large')

fig.tight_layout()
#fig.savefig("PBMC10k_cellType-binaryPlot2.png", dpi=150, bbox_inches = "tight")
fig.savefig('PBMC10k_cellType-binaryPlot2.pdf', dpi=600, bbox_inches='tight')

Regulon specificity scores (RSS) across Louvain clusters

Calculate RSS

In [30]:
rss_louvain = regulon_specificity_scores( auc_mtx, cellAnnot['Louvain_clusters_Scanpy'] )
rss_louvain
Out[30]:
AHR_(+) AIRE_(+) AR_(+) ARID3A_(+) ARNT_(+) ARNTL2_(+) ASCL2_(+) ATF1_(+) ATF3_(+) ATF4_(+) ... ZNF721_(+) ZNF76_(+) ZNF768_(+) ZNF81_(+) ZNF831_(+) ZNF836_(+) ZNF891_(+) ZNF91_(+) ZSCAN2_(+) ZSCAN31_(+)
1 0.167702 0.134831 0.095345 0.173243 0.176672 0.195214 0.097510 0.194730 0.172422 0.194693 ... 0.177667 0.194938 0.185232 0.187188 0.219024 0.096556 0.147181 0.208766 0.153351 0.114661
0 0.292369 0.089581 0.097183 0.250708 0.237876 0.184571 0.089788 0.207092 0.283414 0.215001 ... 0.138432 0.187750 0.222504 0.128491 0.128016 0.092592 0.117640 0.169678 0.141016 0.112218
5 0.111614 0.098285 0.100686 0.122678 0.121547 0.130460 0.341674 0.128484 0.118155 0.119877 ... 0.109921 0.124783 0.124808 0.111611 0.125934 0.224524 0.105148 0.122637 0.109547 0.116081
3 0.137639 0.095607 0.092931 0.151788 0.140833 0.156044 0.095075 0.147224 0.134775 0.144525 ... 0.205381 0.155351 0.149708 0.182861 0.144645 0.087556 0.171538 0.154047 0.168770 0.150732
6 0.115832 0.121216 0.092222 0.116701 0.120063 0.116751 0.096055 0.126219 0.116109 0.123462 ... 0.127958 0.125634 0.122616 0.126901 0.132314 0.092214 0.121138 0.132062 0.115368 0.105001
2 0.140861 0.129850 0.093529 0.142128 0.152024 0.137654 0.094960 0.162442 0.141404 0.153925 ... 0.163973 0.161723 0.153721 0.166659 0.165640 0.093351 0.148424 0.172063 0.141981 0.126021
12 0.092917 0.088426 0.263248 0.098344 0.091275 0.097211 0.092761 0.093484 0.092625 0.096707 ... 0.091698 0.093541 0.092428 0.092529 0.088917 0.093548 0.088402 0.094451 0.089971 0.098086
4 0.129154 0.127483 0.100345 0.137555 0.136332 0.140719 0.125742 0.148760 0.135886 0.143357 ... 0.124077 0.148570 0.144278 0.126211 0.179422 0.112686 0.114744 0.145467 0.123312 0.122828
8 0.113363 0.088039 0.088830 0.113181 0.107979 0.105703 0.088072 0.107905 0.113810 0.106729 ... 0.095707 0.111864 0.107316 0.097530 0.094628 0.087556 0.093769 0.102857 0.104078 0.089097
10 0.100682 0.089472 0.105055 0.101714 0.100684 0.106963 0.087869 0.098252 0.100026 0.109152 ... 0.098801 0.101056 0.098890 0.102499 0.090766 0.088448 0.094808 0.096277 0.114418 0.087884
7 0.116657 0.091441 0.094110 0.125379 0.115367 0.135398 0.090600 0.120548 0.114470 0.122209 ... 0.142921 0.123518 0.120900 0.132470 0.114220 0.088743 0.141213 0.124040 0.126100 0.111707
9 0.100182 0.088569 0.087556 0.101080 0.101298 0.105130 0.107584 0.102781 0.100046 0.103816 ... 0.107300 0.103304 0.101754 0.106391 0.110681 0.095915 0.100551 0.106206 0.097869 0.095049
11 0.097879 0.087762 0.088944 0.096175 0.094880 0.095274 0.087556 0.095239 0.096857 0.096258 ... 0.092381 0.093324 0.094711 0.092007 0.090463 0.091447 0.088840 0.093683 0.091421 0.089271

13 rows × 375 columns

RSS panel plot with all cell types

In [32]:
cats = sorted( list(set(cellAnnot['Louvain_clusters_Scanpy'])), key=int )

fig = plt.figure(figsize=(15, 12))
for c,num in zip(cats, range(1,len(cats)+1)):
    x=rss_louvain.T[c]
    ax = fig.add_subplot(3,5,num)
    plot_rss(rss_louvain, c, top_n=5, max_n=None, ax=ax)
    ax.set_ylim( x.min()-(x.max()-x.min())*0.05 , x.max()+(x.max()-x.min())*0.05 )
    for t in ax.texts:
        t.set_fontsize(12)
    ax.set_ylabel('')
    ax.set_xlabel('')
    adjust_text(ax.texts, autoalign='xy', ha='right', arrowprops=dict(arrowstyle='-',color='lightgrey'), precision=0.001 )
 
fig.text(0.5, 0.0, 'Regulon', ha='center', va='center', size='x-large')
fig.text(0.00, 0.5, 'Regulon specificity score (RSS)', ha='center', va='center', rotation='vertical', size='x-large')
plt.tight_layout()
plt.rcParams.update({
    'figure.autolayout': True,
        'figure.titlesize': 'large' ,
        'xtick.labelsize':'large',
        'ytick.labelsize':'large'
        })
plt.savefig("PBMC10k_Louvain-RSS-top5.png", dpi=150, bbox_inches = "tight")
plt.show()

Select the top 5 regulons from each cell type

In [33]:
topreg = []
for i,c in enumerate(cats):
    topreg.extend(
        list(rss_louvain.T[c].sort_values(ascending=False)[:5].index)
    )
topreg = list(set(topreg))

Generate a heatmap

In [34]:
def palplot(pal, names, colors=None, size=1):
    n = len(pal)
    f, ax = plt.subplots(1, 1, figsize=(n * size, size))
    ax.imshow(np.arange(n).reshape(1, n),
              cmap=mpl.colors.ListedColormap(list(pal)),
              interpolation="nearest", aspect="auto")
    ax.set_xticks(np.arange(n) - .5)
    ax.set_yticks([-.5, .5])
    ax.set_xticklabels([])
    ax.set_yticklabels([])
    colors = n * ['k'] if colors is None else colors
    for idx, (name, color) in enumerate(zip(names, colors)):
        ax.text(0.0+idx, 0.0, name, color=color, horizontalalignment='center', verticalalignment='center')
    return f
In [36]:
colors = sns.color_palette('bright',n_colors=len(cats) )
colorsd = dict( zip( cats, colors ))
colormap = [ colorsd[x] for x in cellAnnot['Louvain_clusters_Scanpy'] ]
In [37]:
sns.set()
sns.set(font_scale=0.8)
fig = palplot( colors, cats, size=1.0)
/usr/local/lib/python3.7/site-packages/matplotlib/figure.py:2369: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect.
  warnings.warn("This figure includes Axes that are not compatible "
In [38]:
g = sns.clustermap(auc_mtx_Z[topreg], annot=False,  square=False,  linecolor='gray',
    yticklabels=False, vmin=-2, vmax=6, row_colors=colormap,
    cmap="YlGnBu", figsize=(21,16) )
sns.set(font_scale=0.9)
g.cax.set_visible(True)
g.ax_heatmap.set_ylabel('')    
g.ax_heatmap.set_xlabel('')    
Out[38]:
Text(0.5, 313.60937500000006, '')

Further exploration of modules directly from the network inference output

In [39]:
adjacencies = pd.read_csv("adj.tsv", index_col=False, sep='\t')

Create the modules:

In [40]:
from pyscenic.utils import modules_from_adjacencies
modules = list(modules_from_adjacencies(adjacencies, exprMat))
/usr/local/lib/python3.7/site-packages/pyscenic/utils.py:138: RuntimeWarning: invalid value encountered in greater
  regulations = (rhos > rho_threshold).astype(int) - (rhos < -rho_threshold).astype(int)
/usr/local/lib/python3.7/site-packages/pyscenic/utils.py:138: RuntimeWarning: invalid value encountered in less
  regulations = (rhos > rho_threshold).astype(int) - (rhos < -rho_threshold).astype(int)

pick out modules for EBF1:

In [41]:
tf = 'EBF1'
tf_mods = [ x for x in modules if x.transcription_factor==tf ]

for i,mod in enumerate( tf_mods ):
    print( f'{tf} module {str(i)}: {len(mod.genes)} genes' )
print( f'{tf} regulon: {len(regulons[tf+"_(+)"])} genes' )
EBF1 module 0: 548 genes
EBF1 module 1: 303 genes
EBF1 module 2: 51 genes
EBF1 module 3: 85 genes
EBF1 module 4: 166 genes
EBF1 module 5: 478 genes
EBF1 regulon: 81 genes

write these modules, and the regulon to files:

In [42]:
for i,mod in enumerate( tf_mods ):
    with open( tf+'_module_'+str(i)+'.txt', 'w') as f:
        for item in mod.genes:
            f.write("%s\n" % item)
            
with open( tf+'_regulon.txt', 'w') as f:
    for item in regulons[tf+'_(+)']:
        f.write("%s\n" % item)

These module files can be further explored in iRegulon.

A detailed iRegulon tutorial is available here

This can produce, for example, the following image:

In [43]:
from IPython.display import display, Image
display(Image(filename='iRegulon_screenshot_PBMC10k-EBF1.png'))